Processing Regular Path Queries on Giraph
نویسندگان
چکیده
In the last few years social networks have reached an ubiquitous di↵usion. Facebook, LinkedIn, and Twitter now have billions of users, that daily interact together and establish new connections. Users and interactions among them can be naturally represented as data graphs, whose vertices denote users and whose edges are labelled with information about the di↵erent interactions. In this paper we sketch a novel approach for processing regular path queries on very large graphs. Our approach exploits Brzozowski’s derivation of regular expressions to allow for a vertex-centric, message-passing-based evaluation of path queries on top of Apache Giraph.
منابع مشابه
A scalable graph pattern matching engine on top of Apache Giraph
Many applications are switching to a graph representation of their data in order to take advantage of the connections that exist between entities. Consequently, graph databases are becoming increasingly popular. A query in such a graph database can be expressed as a graph pattern matching problem, which is NP complete, a problem especially relevant in the presence of largedata. To overcome the ...
متن کاملAn Experimental Comparison of Pregel-like Graph Processing Systems
The introduction of Google’s Pregel generated much interest in the field of large-scale graph data processing, inspiring the development of Pregel-like systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregel-like systems perform, we conduct a study to experimentally compare Giraph, GPS, Mizan, and GraphLab...
متن کاملEfficient Processing of XPath Queries Using Indexes
A number of query languages have been proposed in recent times for processing queries on XML and semistructured data. All these query languages make use of regular path expressions to query XML data. To optimize the processing of query paths a number of indexing schemes have also been proposed recently. XPath provides the basis for processing queries on XML data in the form of regular path expr...
متن کاملProcessing Regular Path Queries on Arbitrarily Distributed Data
Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression. We study the techniques to process such queries on a distributed graph of data. While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be...
متن کاملOn decidability of boundedness property for regular path queries
The paper studies the evaluation of regular path queries on semi-structured data, i.e. path queries of the form nd all objects reachable by path whose labels form a word in p where p is a regular expression. We use local information expressed in the form of path inclusions in the optimization of path queries. These constraints are of the form L v where L is a regular language and v is a word; L...
متن کامل